Improving Translation Models by Applying Asymmetric Learning

نویسندگان

  • Setsuo Yamada
  • Masaaki Nagata
  • Kenji Yamada
چکیده

The statistical Machine Translation Model has two components: a language model and a translation model. This paper describes how to improve the quality of the translation model by using the common word pairs extracted by two asymmetric learning approaches. One set of word pairs is extracted by Viterbi alignment using a translation model, the other set is extracted by Viterbi alignment using another translation model created by reversing the languages. The common word pairs are extracted as the same word pairs in the two sets of word pairs. We conducted experiments using English and Japanese. Our method improves the quality of a original translation model by 5.7%. The experiments also show that the proposed learning method improves the word alignment quality independent of the training domain and the translation model. Moreover, we show that common word pairs are almost as useful as regular dictionary entries for training purposes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Phrase-Based Translation via Word Alignments from Stochastic Inversion Transduction Grammars

We argue that learning word alignments through a compositionally-structured, joint process yields higher phrase-based translation accuracy than the conventional heuristic of intersecting conditional models. Flawed word alignments can lead to flawed phrase translations that damage translation accuracy. Yet the IBM word alignments usually used today are known to be flawed, in large part because I...

متن کامل

Applying boosting to statistical machine translation

Boosting is a general method for improving the accuracy of a given learning algorithm under certain restrictions. In this work, AdaBoost, one of the most popular boosting algorithms, is adapted and applied to statistical machine translation. The appropriateness of this technique in this scenario is evaluated on a real translation task. Results from preliminary experiments confirm that statistic...

متن کامل

Risk Management in Oil Market: A Comparison between Multivariate GARCH Models and Copula-based Models

H igh price volatility and the risk are the main features of commodity markets. One way to reduce this risk is to apply the hedging policy by future contracts. In this regard, in this paper, we will calculate the optimal hedging ratios for OPEC oil. In this study, besides the multivariate GARCH models, for the first time we use conditional copula models for modelling dependence struc...

متن کامل

Improved Arabic Dialect Classification with Social Media Data

Arabic dialect classification has been an important and challenging problem for Arabic language processing, especially for social media text analysis and machine translation. In this paper we propose an approach to improving Arabic dialect classification with semi-supervised learning: multiple classifiers are trained with weakly supervised, strongly supervised, and unsupervised data. Their comb...

متن کامل

Attention is All you Need

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. E...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003